Large Variations in Bacterial Ribosomal RNA Genes 
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Abstract 

Ribosomal RNA (rRNA) genes, essential to all forms of life, have been viewed as highly conserved and evolutionarily stable, 
partly because very little is known about their natural variations. Here, we explored large-scale variations of rRNA genes 
through bioinformatic analyses of available complete bacterial genomic sequences with an emphasis on formation 
mechanisms and biological significance. Interestingly, we found bacterial genomes in which no 16S rRNA genes harbor the 
conserved core of the anti-Shine-Dalgarno sequence (5'-CCTCC-3'). This loss was accompanied by elimination of Shine- 
Dalgarno-like sequences upstream of their protein-coding genes. Those genomes belong to 1 or 2 of the following 
categories: primary symbionts, hemotropic Mycoplasma, and Flavobacteria. We also found many rearranged rRNA genes 
and reconstructed their history. Conjecturing the underlying mechanisms, such as inversion, partial duplication, 
transposon insertion, deletion, and substitution, we were able to infer their biological significance, such as co-orientation of 
rRNA transcription and chromosomal replication, lateral transfer of rRNA gene segments, and spread of rRNA genes with 
an apparent structural defect through gene conversion. These results open the way to understanding dynamic 
evolutionary changes of rRNA genes and the translational machinery. 
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Introduction 

Ribosomal RNA (rRNA) genes have been widely used for 
estimating phylogeny because they are present in all cells, 
share a high similarity in their conserved regions (used for 
gene detection), but vary among different lineages in 
their less conserved regions (used for phylogeny classifi- 
cation) (Woese 1987; Doolittle 1999). There are often 
multiple rRNA operons within a genome (Lee et al. 
2009). Intragenomic heterogeneity of rRNA genes has 
been observed (Acinas et al. 2004; Pei et al. 2010) in spite 
of homogenization processes of paralogous rRNA genes 
through gene conversion between rRNA operons on 
different loci (Hashimoto et al. 2003). Bacillus subtilis 
mutants that had only one rRNA operon showed varied 
sporulation abilities depending on which remained 
among the 11 rRNA operons (Nanamiya et al. 2010). 
Ribosomes with structural variations arising from the 
heterogeneous rRNA genes might play distinct roles, 
and diversity in the ribosomal functions may be advan- 
tageous to cells. 

There is a functional region on 16S rRNAs that is 
highly conserved but unique to prokaryotes: the anti- 
Shine-Dalgarno (anti-SD) sequence on their 3' tails (Shine 
and Dalgarno 1975). Direct binding of the SD sequence on 
the messenger RNA (mRNA) 5' -untranslated region 
(5'-UTR) with an anti-SD sequence initiates translation 
(fig. 1), which was first described in Escherichia coli as 
5'-ACCUCCU-3' (Shine and Dalgarno 1974). Its core 



sequence, 5'-CCUCC-3', has been known to be universal 
among all the surveyed prokaryotes (Ma et al. 2002; 
Nakagawa et al. 2010) except for Candidatus Carsonella 
ruddii, which is a primary symbiont of insects (Thao et al. 
2000). Degeneration of the anti-SD sequence suggests that 
the SD/anti-SD interaction is not the only mechanism for 
translation initiation. A well-known alternative mechanism 
is direct translation of leaderless mRNAs, which have no or 
an extremely short 5'-UTR. Such leaderless mRNAs ac- 
count for 2.2% of the Helicobacter pylori transcriptome 
(Sharma et al. 2010). In Halobacterium salinarum, leaderless 
mRNAs were much more efficiently expressed than 
mRNAs containing the SD sequence (Sartorius-Neef and 
Pfeifer 2004). Escherichia coli MazF (an ACA-specific endor- 
ibonuclease) likely induces SD-independent translation by 
producing leaderless mRNAs and 16S rRNAs without the 
anti-SD sequence (Vesper et al. 2011). 

Little effort has been made to understand variations and 
evolutionary changes of rRNA genes, which are possibly 
crucial proxies for cellular translation. In this study, varia- 
tions that span a region larger than a single base pair on 16S 
rRNA genes in eubacteria were systematically surveyed. 
Some bacterial genomes do not possess the conserved core 
of the anti-SD sequences on any of their 16S rRNA genes 
and conceivable forces for the loss are inferred and dis- 
cussed. We also describe other structural variations of 
rRNA genes, propose plausible underlying mechanisms 
through genomic comparison, and discuss their biological 
significance. 
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Fig. 1. Interaction between SD and anti-SD sequences. Interaction is 
between the mRNA 5'-UTR (for rpsB gene) and the 16S rRNA 3' 
tail in Escherichia coli. 

Materials and Methods 

Genomic Sequences 

Reference sequences of complete eubacterial genomes 
(supplementary table S1, Supplementary Material online) 
and their annotation information were downloaded from 
the National Center for Biotechnology Information website 
(ftp://ftp.ncbi.nih.gov/genomes/Bacteria/) on 26 April 
2011. 

Search for 16S rRNA Genes without the Anti-SD 
Sequence 

16S rRNA gene sequences were retrieved based on their 
registered annotations. We located corresponding regions 
of those sequences to the 3' tail of the reference 16S rRNA 
gene (a sequence of the 16S rRNA gene of £ coli str. K-12 
substr. DH10B) and assumed the located regions as the 3' 
tail. The ClustalW algorithm (Larkin et al. 2007) was used 
for aligning the sequences. When the registered annotation 
did not include the 3' tail sequence, we searched for 
it downstream. We surveyed the 3' tail for the core 
anti-SD sequence, 5'-CCTCC-3\ 

Phylogenetic Tree 

Sequences were aligned by ClustalW with default param- 
eters, and trees were constructed using the Molecular 
Evolutionary Genetics Analysis (MEGA5) alignment tool 
(Tamura et al. 2011) with the following implementations: 
maximum likelihood method, Tamura-Nei model (Tamura 
and Nei 1993), and bootstrap replication of 1,000 times. 

Analysis of SD/Anti-SD Interaction 
For every protein-coding gene, we retrieved the region —20 
to —5 nt from the start codon (based on the registered 
annotations), which was designated as the SD region. 
Schurr et al. (1993) quantified the change in free energy 
(AG) in duplex formation between the SD region and 
the 3' tail of 16S rRNA to select favorable SD sequences. 
Following this approach, other previous studies set a cutoff 
AG value for determining SD-like sequences, which was 
—3.4535 kcal/mol using the calculation by Starmer et al. 
(2006), but was —4.4 kcal/mol using the calculation by 
Ma et al. (2002). In this study, both parameters were con- 
sidered for the prediction of SD-like sequences. AG was 
measured by the FREE_SCAN algorithm (Starmer et al. 2006). 

Using this strategy, we estimated the gene fraction car- 
rying the SD-like sequences among all protein-coding genes 
for a given genome by the equation: F SD = (Number of 



protein-coding genes with the SD-like sequences)/(Num- 
ber of total protein-coding genes). F SD of each genome 
was adjusted to the SD index (dF SD ) as Nakagawa et al. 
(2010) described by the equation: dF SD = F SD — rF SD , where 
rF SD was the fraction of the SD-like sequences among 
20,000 artificial sequences (16 nt) constructed using a back- 
ground fraction of each nucleotide as the probability for 
generating a particular nucleotide. The background nucle- 
otide fraction was defined as the overall fraction of each 
nucleotide in sequences collected from —21 to —100 nt 
of all protein-coding genes in the given genome. 

Analysis of Nucleotide Bias in the SD Region 
We retrieved the upstream sequences (from —50 to —1) of 
every protein-coding gene in a given genome based on the 
registered annotations. Nucleotide fractions at specific po- 
sitions of the retrieved sequences were compared with the 
background fractions (described above), and df N (N = A, C, 
G, or T) denotes its change by the equation: d/ N = (fraction 
of nucleotide N at the specific position) — (the background 
fraction of nucleotide N). 

Search for Imperfect rRNA Genes and 
Reconstruction of Rearrangement History 
We searched for sequences with partial homology to the 
£ coli 16S rRNA gene using the Blastn algorithm 
(Altschul et al. 1990) against the complete genomic 
sequences of Eubacteria. The detected sequence and its 
flanking 2-kb regions were compared with other homolo- 
gous counterparts in the same genome or closely related 
genomes using the MEGA5 with the ClustalW algorithm 
implemented within. Although our primary targets were 
16S rRNA genes, we expanded our analysis when a rear- 
rangement event involved other adjacent genes. A dotplot 
was drawn to visualize genomic rearrangements using 
Gepard (Krumsiek et al. 2007) with a word length of 10 nt. 

Results and Discussion 

Degeneration of the Anti-SD Sequence 
Unusual Sequences on 3' Tails of 76S rRNA Genes 
As asserted in many publications (Ma et al. 2002; Chang 
et al. 2006; Nakagawa et al. 2010), SD-like sequences are 
rarely seen upstream of protein-coding genes in some 
prokaryotic genomes despite the conserved anti-SD se- 
quence on their 16S rRNA genes. This may imply that 
the SD/anti-SD interaction functions in an extremely minor 
portion of genes in those genomes. Evidence for loss of such 
interaction was reported previously in Gandidatus Carso- 
nella ruddii, as polymerase chain reaction sequencing of 
16S-23S spacer regions revealed the loss of the core 
anti-SD sequence, 5'-CCTCC-3' (referred to as the anti- 
SD motif) (Thao et al. 2000). Accelerated accumulation 
of complete genomic sequences allowed us to confirm 
the loss of the anti-SD motif not only in Gandidatus 
Carsonella ruddii but also in many other bacterial genomes 
(table 1). Fifteen among 1,182 complete genomes of Eubac- 
teria do not have the anti-SD motif in any of their 1 6S rRNA 
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Table 1. Bacterial Strains with Unusual 16S rRNA 3' Tail Sequences. 



Group 


Class 


Strain 


Number of 16S rRNA 
Genes 

w/o Anti-SD 
Total Motif 


Genomic 
Size (nt) 


Note 


1 


Flavobacteria 


Flavobacteriaceae bacterium 3519-10 


2 


2 


2,768,102 


Psych rophile 




Flavobacteria 


Riemerella anatipestifer DSM 15868 


3 


3 


2,155,121 


Pathogen of poultry 




Flavobacteria 


Weeksella virosa DSM 16922 


5 


5 


2,272,954 


Isolated from 














human urine 


2 


Flavobacteria 


Blattabacterium sp. str. Bge 


1 


1 


636,850 


Primary symbionts 




Flavobacteria 


Blattabacterium sp. str. BP LAN 


1 


1 


636,994 


of insects 




Flavobacteria 


Candidatus Sulcia muelleri CARI 


1 




276,511 






Flavobacteria 


Candidatus Sulcia muelleri DMIN 


1 




243,933 






Flavobacteria 


Candidatus Sulcia muelleri GWSS 


1 




245,530 






Flavobacteria 


Candidatus Sulcia muelleri SMDSEM 


1 




276,984 




3 


Alphaproteobacteria 


Candidatus Hodgkinia cicadicola Dsem 


1 




143,795 


Primary symbionts 




Betaproteobacteria 


Candidatus Zinderia insecticola CARI 


1 




208,564 


of insects 




Cammaproteobacteria 


Candidatus Carsonella ruddii PV 


1 




159,662 




4 


Mollicutes 


Mycoplasma haemofelis str. Langford 1 


1 




1,147,259 


Hemotropic 




Mollicutes 


Mycoplasma suis str. Illinois 


1 




742,431 


mycoplasmas 




Mollicutes 


Mycoplasma suis KI3806 


1 




709,270 





genes (herein, we refer to the 15 genomes as non-anti-SD 
genomes for simplicity). 

We categorized the non-anti-SD genomes into four 
groups (table 1) in terms of phylogeny and life style: Group 

1 consists of three strains that harbor multiple 16S rRNA 
genes and that belong to the class Flavobacteria (Raymond 
et al. 2008; Lang et al. 2011; Mavromatis et al. 2011); Group 

2 consists of six Flavobacteria strains which are primary 
symbionts (obligate and mutualistic bacteria with an an- 
cient history of host association) of insects (McCutcheon 
et al. 2009a; Sabree et al. 2009); Group 3 members are also 
primary symbionts of insects, but belong to the phylum 
Proteobacteria (Nakabachi et al. 2006; McCutcheon et al. 
2009b; McCutcheon and Moran 2010); Group 4 consists 
of three Mycoplasma strains living in erythrocytes (Barker 
et al. 2011; Guimaraes et al. 2011). 

In figures 2-4, we present phylogenetic trees based on 
full-length 16S rRNA genes of these non-anti-SD genomes 
and other references to show phylogenetic contexts of the 
loss. Groups 1 (fig. 2A), 2 (fig. 2A), and 4 (fig. 4A) clustered 
into distinctive clades, individually, implying that the first 
mutation within the anti-SD motif possibly took place in an 
ancestor of each group. Members of Group 1 share a var- 
iation: the anti-SD motif, 5'-CCTCC-3', was changed to 
5'-TCTCA-3' (fig. 2B). In the other groups, the variation 
is diverse within a group. In Group 2, the anti-SD motif, 5'- 
CCTCC-3', was changed to 5'-ICTCT-3' or 5'-rnrC[-3' 
(fig. 2B). There is divergence even within the strains in 
Candidatus Sulcia muelleri: 5'-TCTCT-3' from CARI and 
SMDSEM and S' -TUCJ^' from DMIN and GWSS. Exten- 
sively degenerated 3' tail sequences were found in Group 3; 
5'-CCTCC-3' was changed to 5'-II T <5A'3', 5'-CATTT'3', 
or 5'-TTTTT-3' (fig. 3B). In Group 4, Mycoplasma haemo- 
felis has a degenerated sequence, 5'-TCTTC-3', and the 
two A/I. suis strains have 5'-CTTTT-3', instead of the 
anti-SD motif (fig. 4B). 



Multiple Cs in the anti-SD sequence possibly are pivotal 
elements for firm RNA-RNA binding by forming C-G 
hydrogen bonds, which are stronger than the A-T bonds 
(Freier et al. 1986). The degeneration of the anti-SD motif, 
mentioned above, predominantly resulted in substitutions 
from C to T or A, thereby likely decreasing the SD/anti-SD- 
binding capacity. 

Lack of SD-like Sequences in Genomes with a Defect in the 
Anti-SD Motif 

Next, we looked for SD-like sequences in these non- 
anti-SD genomes. The widely accepted strategy to deter- 
mine SD-like sequences is to test whether the SD region 
(defined as the region —20 to —5 nt from the start codon) 
is able to form a duplex with the host's 16S rRNA 3' tail. 
There will be many combinations of duplexes. Starmer et al. 
(2006) assumed that a given SD region had an SD-like 
sequence when the lowest AG value among the combina- 
tions was lower than —3.4535 kcal/mol, whereas Ma et al. 
(2002) applied a more stringent parameter, —4.4 kcal/mol. 
Using both parameters, we obtained fraction of protein- 
coding genes with SD-like sequences (F SD ). Based on the 
work of Nakagawa et al. (2010), we used F SD after substitut- 
ing the SD fraction in artificial sequences generated based on 
its background nucleotide fraction (for details, see Materials 
and Methods). Thus, the adjusted value (dF SD ) indicates a gene 
fraction with the SD/anti-SD interaction relative to a fraction 
with random intergenic region/anti-SD interaction. A df SD 
value < 0 indicates that fewer SD regions than random inter- 
genic regions have capacities for binding to 16S rRNA 3' tails. 

Another method we applied directly showed nucleotide 
bias in the SD region. We calculated changes in the nucle- 
otide fraction (dF N ; N = A, C, G, or T) at specific positions 
before the start codon. Because the conserved anti-SD mo- 
tif is C-rich, the functional SD region should be G-rich, 
which is clearly seen in the £. coli SD regions (fig. 3D(i)). 
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Fig. 2. Comparative analysis of the anti-SD genomes and the non-anti-SD genomes in the class Flavobacteria. (A) Maximum likelihood 
phylogenetic tree. Groups 1 and 2 non-anti-SD genomes are shown in gray. (B) Predicted 16S rRNA 3' tail sequences. The regions 
corresponding to the Escherichia coii anti-SD motif are shaded in gray. (C) SD indexes (dF SD ). Triangle: cutoff value < —3.4535 kcal/mol; Dot: 
cutoff value < —4.4 kcal/mol. (D) Fractions of the four nucleotides (df A , df c , df c , and df T ) at specific positions between —50 and — 1 nt from all 
protein-coding genes in each genome. The background fraction was subtracted. For details, see Materials and Methods. 



Signal intensities of the SD-like sequences have not been 
studied in Groups 1 and 2 non-anti-SD genomes belonging 
to the class Flavobacteria (table 1). To our surprise, we 
found that members of this class showed dF SD values < 
0 and mean AG values > — 1 kcal/mol with the standard 
deviations ranging from 1.4 to 2.12, regardless of the pres- 
ence of an anti-SD motif (fig. 2C and supplementary table 
S2, Supplementary Material online). A possible explanation 
may be due to A-rich signals at the corresponding areas of 
the SD region in all surveyed Flavobacteria (fig. 2D), as op- 
posed to the G-rich signal in £ col'i, which may aid ribo- 
somal protein S1 to bind and assist in translation 
initiation as an alternative to the anti-SD sequence. In E. 
col'i, ribosomal protein S1 co-contributes to translation ini- 
tiation complex formation through its high affinity to AU- 
rich regions often observed on 5'-UTRs (Draper et al. 1977; 
Boni et al. 1991; Sengupta et al. 2001; Salah et al. 2009). The 
protein seems to assist firm binding between mRNA and 
the ribosome because it was dispensable when the SD/anti- 
SD interaction was strong (Farwell et al. 1992). The ability of 
ribosomal protein S1 to initiate translation by itself without 
the SD/anti-SD interaction has not been demonstrated. 
Another conceivable possibility is that a stretch of T-rich 
sequences right after the anti-SD sequence (fig. 2B) inter- 
acts with the mRNA A-rich region for translation initiation. 
Our 16S rRNA 3' tail prediction was based on sequence 
comparison with the reference E. col'i 16S rRNA. Thus, 
Flavobacteria 16S rRNA 3' tails could be longer (which 
may support the A-T interaction hypothesis) or shorter 
(which may dim the hypothesis) than the predicted length. 
Accurate annotation of 16S rRNA ends in the non-anti-SD 



strains, which include Flavobacteria and other taxonomic 
groups, remains to be achieved, which would give insight to 
processes underlying anti-SD motif degeneration. 

Figure 3 shows the non-anti-SD genomes (Group 3) and 
the reference genomes in the phylum Proteobacteria. Var- 
ious dF SD values (range: 0.09-0.41 with a cutoff value of 
—4.4 kcal/mol) are seen in the reference genomes, whereas 
those <0 were observed in the non-anti-SD genomes 
(fig. 3C, supplementary table S2, Supplementary Material 
online). This result is consistent with the nucleotide bias 
analysis (fig. 3D): G peaks were observed in the references 
but not in Group 3 non-anti-SD genomes. Mean AG values 
of Group 3 non-anti-SD genomes (range: —1.43 to —0.73) 
were higher than those of the references (range: —5.81 
to —1.78) (supplementary table S2, Supplementary Mate- 
rial online). The AG standard deviation values more clearly 
distinguish Group 3 (range: 1.07-1.60) from the references 
(range: 2.64-3.18). It is noteworthy that many reference ge- 
nomes we chose were host-obligate bacteria with similar 
features, in terms of host association, to Group 3 strains. 
For example, Buchnera aphidicola str. 5A, Baumannia cica- 
delimicola str. He, and Candidatus Blochmannia vafer str. 
BVAF are primary symbionts of insects, as are all Group 
3 strains. Our result supports the loss of the SD/anti-SD 
interaction among Group 3 members and also indicates 
that being a primary symbiont of insects is not necessarily 
indicative of anti-SD motif loss. 

The phylum Mollicutes, which includes the Mycoplasma 
genus, is diverse in SD-like signals (Nakagawa et al. 2010). 
Three recently sequenced hemotropic mycoplasmas 
(Group 4) that lost anti-SD motifs (fig. 4B) did not indicate 
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Fig. 3. Comparative analysis of the anti-SD genomes and the non-anti-SD genomes in the phylum Proteobacteha. (A) Maximum likelihood 
phylogenetic tree. Group 3 non-anti-SD genomes are shown in gray. (B) Predicted 16S rRNA 3' tail sequences. The regions corresponding to 
the Escherichia coli anti-SD motif are shaded in gray. (C) SD indexes (dF SD ). Triangle: cutoff value < —3.4535 kcal/mol; Dot: cutoff value < —4.4 
kcal/mol. (D) Fractions of the four nucleotides (df A , d/ 0 df G , and df T ) at specific positions between —50 and — 1 nt from all protein-coding 
genes in each genome. The background fraction was subtracted. For details, see Materials and Methods, (i) The class Gammaproteobacteria. (ii) 
The class Betaproteobacteha. (iii) The class Alphaproteobacteria. 



SD/anti-SD interactions; there were no G-rich signals or any 
indicators within the SD regions (fig. 4C and D). Among 
Mycoplasma strains with the anti-SD motif, Mycoplasma 
genitalium and Mycoplasma pneumonia, which are phylo- 
genetically closely related (fig. 4A), showed no G-rich sig- 
nals within the SD regions and had very low dF SD and high 
AG values (fig. 4C and D and supplementary table S2, Sup- 
plementary Material online). The AG standard deviation 
values (range: 2.60-3.47) of these two strains, however, 
were distinct from Group 4 (range: 1.69-1.73). In other My- 
coplasma species, the SD/anti-SD interaction appears to 
largely involve translation initiation, as considerable SD-like 
signals were observed. 

These results substantiate our assumption that the loss 
of the anti-SD motif equates to loss of SD/anti-SD interac- 
tions. Conservation of the anti-SD motif, however, does not 
always correspond to a high frequency in SD-like sequen- 
ces, as dramatically seen in Flavobacteria (with the anti-SD 
motif) with an A-rich signal prior to the start codon as well 
as in A/1, genitalium and A/1, pneumonia. We assume that SD- 
led and non-SD-led mechanisms (direct translation of 
leaderless mRNA or other unknown mechanisms) for 
translation initiation coexist in most bacteria, and SD-led 
mechanisms appear in a very small number of Flavobacteria 



(with the anti-SD motif), A/1, genitalium, and A/1, pneumonia 
genes or not at all in the non-anti-SD strains. 

Factors Responsible for the Anti-SD Motif Loss 
The loss of the anti-SD motif in primary symbionts (Groups 
2 and 3) and hemotropic mycoplasmas (Group 4) might be 
related to their intracellular characteristics. An intracellular 
lifestyle restricts the effective population size, causes fre- 
quent population bottlenecks at the time of transmission, 
and maintains copious metabolites, which promote dele- 
terious mutations and massive genomic downsizing (Toft 
and Andersson 2010). Primary symbionts and Mycoplasma 
are thought to be at the extreme of this adaptation, as they 
lack genes most free-living bacteria have (including those 
for DNA repair, recombination, and transfer) and carry the 
smallest genomes among the sequenced to date as a con- 
sequence (AAoran 2002; Dale and AAoran 2006). The SD/ 
anti-SD interaction, which is useful but not essential for life, 
might have degenerated in response to genomic minimi- 
zation. If we assume that ancestors of obligate non- 
anti-SD strains (Groups 2, 3, and 4) in a free-living state 
had multiple mechanisms for translation initiation, 
the evolutionary forces toward large-scale gene/function 
loss during a period of host association may have forced 
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Fig. 4. Comparative analysis of the anti-SD genomes and the non-anti-SD genomes in the genus Mycoplasma. (A) Maximum likelihood 
phylogenetic tree. Group 4 non-anti-SD genomes are shown in gray. (B) Predicted 16S rRNA 3' tail sequences. The regions corresponding to 
the Escherichia coli anti-SD motif are shaded in gray. (C) SD indexes (dF SD ). Triangle: cutoff value < —3.4535 kcal/mol; Dot: cutoff value < —4.4 
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genes in each genome. The background fraction was subtracted. For details, see Materials and Methods. 



the strains to sacrifice some genetic elements for the SD- 
led mechanism. 

Loss of the SD/anti-SD interaction, however, is not 
strongly correlated with reduced genomic size. Although 
Groups 2 and 3 non-anti-SD genomes are members of 
the smallest genomes among those used in this study (table 
1 and supplementary table S1, Supplementary Material 
online), SD-led translation initiation appears to work in 
B. aphidicola str. 5A, B. cicadellinicola str. He and Ca. Bloch- 
mannia vafer str. BVAF (fig. 3), which are also primary 
symbionts of insects and have genomic sizes as small as 
those in Groups 2 and 3 (supplementary table S1, Supple- 
mentary Material online). Moreover, the genomic size of 
A/l. haemofelis str. Langford 1 (Group 4) is the third largest 
among 26 Mycoplasma strains we analyzed (supplemen- 
tary table S1, Supplementary Material online and fig. 4). 
Each bacterium is on its own evolutionary stage and 
path, which makes it difficult to simplify causes for anti- 
SD motif loss. Other evolutionary forces related with host 
interactions, such as immune evasion, unique intracellular 
environments (e.g., within an erythrocyte), and lineage- 
specific inherited features of translation processes, may 
have played a role in this loss. 

Another feature relevant to anti-SD motif loss is being 
a member of the class Flavobacteria. No primary symbionts 
in this class that has been completely sequenced to date 
possess an anti-SD motif. Three non-anti-SD genomes that 
do not show extreme genomic reduction and have multiple 
rRNA operons (Group 1) also fall within this class (table 1). 
The distinctive A-rich pattern before a start codon (fig. 2D) 



represents an unknown alternative mechanism for transla- 
tion initiation (which might be enabled by ribosomal pro- 
tein S1 or a novel form of rRNA-mRNA interaction as 
described above) that is superior to the SD-led mechanism 
in Flavobacteria. Flavobacteria seem prone to the loss of the 
anti-SD motif due to the weak dependence on the SD/anti- 
SD interaction compared with other bacterial groups. 

Inferred Mechanisms of Rearranged rRNA Genes 
We also analyzed structural variants of rRNA genes with an 
emphasis on their underlying formation mechanisms and 
possible biological significance to elucidate rRNA gene 
evolution. Among 1,182 complete eubacterial genomes we 
surveyed, 15 (1.3%) carried rearranged 16S rRNA genes 
formed by various mechanisms (supplementary table S3, 
Supplementary Material online). 
Inversion of the rRNA Operon 

Splitting of 16S rRNA genes associated with inversion was 
found. Yersinia pestis biovar Microtus 91001 has undergone 
extensive genomic rearrangements and disruptions when 
compared with the genome of V. pestis KIM 10 (fig. 5A) 
(Song et al. 2004). 

The KIM 10 genome (fig. 5B(i)) harbored rRNA operons 
in the same direction as chromosomal DNA replication, as 
do many bacterial genomes (Rocha 2002; Price et al. 2005), 
which is probably because head-to-head encounters of re- 
plisomes and RNA polymerases disturb genomic replica- 
tion (Liu and Alberts 1995; Wang et al. 2007; Srivatsan 
et al. 2010), while bacteria seem to have acquired mecha- 
nisms to cope with a codirectional collision (Pomerantz 
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Fig. 5. Inversion of rRNA operons. (A) Dotplot between genomes. (B) Reconstruction. (C) Recombination sites in one operon. (D) 
Recombination sites in the other operon. Amino acid names indicate the corresponding tRNA genes. 



and O'Donnell 2008). A chromosomal region of approxi- 
mately 220 kb carrying two rRNA operons was somehow 
translocated to the other "arm" of the chromosome across 
the replication origin in the same orientation (fig. 5B(ii)). (A 
possible mechanism for this translocation is two rounds of 
inversion.) Then, the two rRNA operons lay in opposite di- 
rection of DNA replication. Inversion of each operon within 
this region corrected their relative orientation and avoided 
the conflict in biovar Microtus 91 001 -type genomes (fig. 
5B(iii)). 

The above scenario is based on our dotplot analyses 
(fig. 5A). Further sequence comparison (fig. 5C and D) re- 
vealed that the inversion was likely mediated by recombi- 
nation at short (3-5 nt) similar or identical DNA 
sequences. One recombination site is located at the 5' 
end of the 16S rRNA gene, and the product has split of 
the 5' end (fig. 5C and D). We assumed that the split re- 
sulted in a negative functional effect on these 16S rRNA 
genes because the short split region (about 20 nt from 
the 5' end) plays an important role in connecting the 
16S rRNA's 5' domain to the 3' major domain (Wimberly 
et al. 2000), hence the split region (the shorter one) is highly 
conserved and frequently used as a universal priming site 
(Turnbaugh et al. 2009; Unno et al. 2010). 

Partial Duplication 

We observed a novel rearrangement pattern that resem- 
bles internal duplications within an rRNA operon with 
an insertion of a short DNA between the duplicates (fig. 6). 



Figure 6A illustrates the rearrangements (fig. 6A(iii)) and 
two possible underlying mechanisms (fig. 6A(i) and (ii)). In 
the first pathway (fig. 6A(i)), a short (21 nt) DNA segment 
(or its larger ancestral form) (black box) was inserted into 
the genome with a duplication of a long sequence (white 
arrow) on the target. Such mechanisms of "insertion 
with long target duplication" have been recognized for 
restriction-modification systems and other DNAs (Nobusato 
et al. 2000; Furuta et al. 2010, 2011), but not yet for short 
DNAs. In the second possible pathway (fig. 6A(ii)), tandem 
duplication of a DNA segment (black box and white arrow) 
on a 16S rRNA gene occurs first and is followed by homol- 
ogous recombination with the same locus on another ge- 
nome (or with another 16S rRNA gene locus on the same 
genome or a different genome). If the recipient DNA in 
the homologous recombination is so diverged from the do- 
nor DNA that it lacks the sequence corresponding to the 
black box (fig. 6A(ii)), the final product will show the ob- 
served pattern: direct repeats (white arrows) interrupted with 
a short DNA (black box). 

The likely end product was observed in Actinobacillus 
pleuropneumonias L20, where two direct duplicates 
(270 nt) were detected with a DNA of unknown origin 
(21 nt) between them (fig. 6A(iii)). Because the origin of 
the 21 -nt DNA was not identified through our homology 
search, we put more weight on the first hypothesis of in- 
sertion with long target duplication for this rearrangement. 

Figure 6B(i) illustrates an extended version of the 
mechanism described in figure 6A(ii). Two events of 
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Fig. 6. Partial duplication of rRNA genes. (A) Gene rearrangement in Actinobacillus pleuropneumoniae L20. (i) A proposed mechanism for the 
rearrangement: insertion with long target duplication, (ii) The other mechanism: tandem duplication followed by homologous recombination, 
(iii) The rearrangement. (B) Gene rearrangement in Bacillus thuringiensis BMB171. (i) A proposed mechanism for the rearrangement, (ii) The 
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tandem duplication nearby (A~B to AA—BB) are followed 
by a substitution in the inner duplicates (AA—BB) with its 
corresponding region in another rRNA operon that has 
partial divergence at the regions homologous to the end- 
points of the recipient's duplicated region. The reconstruc- 
tion was inferred from a rearrangement pattern in the 
Bacillus thuringiensis BMB17 rRNA operons (fig. 6B(ii)). 
One duplication event occurred at the 16S rRNA gene 
3' end and the other at the 23S rRNA gene 5' end. Short 
diverging DNAs (white and black box) did not show any 
significant similarity with any region in the B. thuringiensis 
Al Hakam genome, but the DNAs and their flanking regions 
exactly matched to rRNA operons of Bacillus cereus ATCC 
14579 (fig. 6B(ii)). This suggests an interspecific lateral 
transfer of an rRNA operon fragment. Our hypothesis 
states that two tandem duplications occurred at an rRNA 
operon in the B. cereus ATCC 14579-type genome, and the 



duplicated region was substituted by an rRNA operon of 
the Al Hakam-type genome, as we described above, to 
produce the BMB171-type operon. 

An Incomplete 76S rRNA Gene on a Plasmid 
Figure 7A illustrates a plausible scenario of lateral transfer 
of a partial 16S rRNA gene via a plasmid, followed by its 
partial incorporation into a homologous region in the chro- 
mosome through homologous recombination. Such a sce- 
nario was proposed for Lactococcus lactis cremoris SK1 1, 
which carries a partial 16S rRNA gene in plasmid 3 (fig. 7B) 
(Siezen et al. 2005). The gene on the plasmid and those on 
the chromosome differ in two regions (fig. 7B and C). Figure 
7C compared the diverged positions with their homolo- 
gous sequences on another genome (L lactis cremoris 
MG1363) in the same subspecies cremoris and on a genome 
(L lactis lactis 111403) in a different subspecies lactis. On 



A 

On lactis 111403 



I 



B 



Laterally transferred 
by plasmid 



On plasmid 41 | I" 

Homologous recombination 

On cremoris SK 11 {~T V 

I " 

On cremoris SK11 



Lactococcus lactis 
cremoris SK1 1 



Lactococcus lactis 
cremoris SK1 1 
plasmid 3 



;2042645 


16S J 




2042005 










27611 


26984 



> 



(one locus) 

C 

cremoris MG1363 (all loci) 
cremoris SK1 1 (5 loci) 
cremoris SK11 (1 locus) 
cremoris SK11 plasmid 3 (partial) 
lactis 11 1403 (all loci) 



GAGCG TGAAG TTGGTGCTTG ACC ATTTGA GAGCA 
GAGCG TGAAG TTGGTGCTTG ACC ATTTGA GAGCA 
GAGCG TGAAGGTTGGT CTT TACC A TGGATGAGCA 
GAGCG TGAAGGTTGGT CTTGTACC A TGGATGAGCA 
GAGCG TGAAGGTTGGT CTTGTACCGA TGGATGAGCA 



CATAATAACTTTAAACATAAGTT 
CATAA AACTTTAAACATAAGTT 
CATAA AACTTTAAACATAAGTT 
CATAA AACTTTAAACA AAGTT 
CATAA AACTTTAAACA AAGTT 



Fig. 7. An incomplete 16S rRNA gene on a plasmid. (A) Reconstruction. (B) The rearrangement. Two diverged regions between the two genes 
are boxed. (C) Sequence comparison in the diverged regions. 
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Fig. 8. Transposon insertion. (A) An insertion into a 16S rRNA gene. 
(B) Another insertion into a 16S rRNA gene. Black arrows: target 
duplication. Gray arrows: incomplete inverted repeats at the 
transposon ends. 

"region 1," there were nine diverged positions among the 
genes, through which two classes can be distinguished (fig. 
7C). All cremoris MG1363 16S rRNA loci and five of six loci 
on the cremoris SK1 1 chromosome comprise one class with 
complete sequence identity, whereas one locus on the 
chromosome and one on the cremoris SK11 plasmid com- 
prise the other class with all lactis 11 1403 loci. This suggests 
that partial gene was transferred from the lactis 111403- 
type to the cremoris MG1363-type genome by the plasmid, 
and homologous recombination replaced the region 1 on a 
locus with that in the partial gene. "Region 2" in the partial 
gene is also identical to that in the lactis 111403-type but 



not to that of the other cremoris-type genes (fig. 7C), sug- 
gesting that homologous recombination took place outside 
of region 2. Significance of the variation in the translation 
process and in the biology of the plasmid has yet to be 
examined. 

Transposon Insertion 

Recent transposon insertions into 16S rRNA genes retain- 
ing duplicated target sequences and imperfect inverted re- 
peats at the transposon ends were found in two distantly 
related species (fig. 8). The insertion in Corynebacterium 
aurimucosum ATCC 700975 occurred near the boundary 
between the 5' and central domains (fig. 8A), whereas that 
in Thermus scotoductus SA-01 (fig. 8B) occurred near the 
boundary between the central and 3' major domains. 
We do not know whether these split genes are functional 
or expressed. 

Deletion 

One often 16S rRNA genes in Clostridium difficile CD196 
has lost a segment of 101 nt near its 3' end. Sequence 
similarity (9 nt) within the gene seems responsible for 
the recombination-mediated deletion (fig. 9A). The lost re- 
gion comprises about a half of the major projection in the 
16S rRNA 3' minor domain, which dictates 30S and 50S 
subunit interactions. The region that includes 3/4 of 
a 23S rRNA gene and 1/2 of a 16S rRNA gene has been 
deleted together with the intergenic region in the Candi- 
dates Protochlamydia amoebophila UWE25 rRNA operon 
(fig. 9B), as previously reported (Pei et al. 2010). 
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Fig. 9. Other modes of rRNA gene rearrangements. (A) Internal deletion in a 16S rRNA gene. (B) Deletion in an rRNA operon. (C) Substitution. 
An amino acid name indicates the corresponding tRNA gene. (D) Inversion within an rRNA operon. 
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Table 2. Distribution of Structural Variants among Multiple rRNA Genes within a Genome. 



Strain 


Rearrangement Type 


Number of rRNA Operons 
Total Rearranged 


Note 


Bacillus thur'mgiensis BMB171 


Partial duplication 


14 14 


Figure 6B 


Actinobacillus pleuropneumonias L20 


Partial duplication 


6 3 


Figure 6A 


Lactobacillus reuteri JCM 1112 


Inversion in an rRNA operon 


6 2 


Figure 9D 



Substitution 

One Bacillus amyloliquefaciens FZB42 16S rRNA gene is al- 
ready known for being partial (Pei et al. 2010). By genomic 
comparison with B. amyloliquefaciens DSM7 and a homol- 
ogy search against its own genome, we revealed that this 
could be explained by DNA substitution (fig. 9C). Substitu- 
tion through recombination requires sequence similarity at 
two locations, 5' and 3'. In our scenario, similarity at the 5' 
end is provided by transfer RNA (tRNA) genes (Arg-Pro 
tRNA) and that at the 3' end by a short DNA sequence, 
5'-GAG-3\ The region enclosed by the similar sequences 
was about more than a half of a 16S rRNA gene in one du- 
plex and Ala-Met tRNA in the other duplex (fig. 9C). The 
substitution brought about addition of the Ala and Met 
tRNA genes and partial deletion of the 16S rRNA gene 
in FZB42. 

Another Inversion Event 

In Lactobacillus reuteri JCM 1112, inversion separated the 
majority of the 16S rRNA gene from its short part (about 30 
nt) at the 3' end (fig. 9D) and also separated the majority of 
the 23S rRNA gene from its short part (about 10 nt) at the 
5' end. The former short part constitutes the minor pro- 
jection of 3' minor domain. Sequence similarity at the 
break points is not clear. Biological significance of this event 
is also not clear. 

Spread of rRNA Gene Variants through Gene Conversion 
Structural variants with nearly identical sequences are 
found in multiple rRNA operons within a genome (table 2), 
suggesting that these rRNA gene variants have spread to 
other rRNA operon loci through gene conversion. The re- 
arrangement in the B. thur'mgiensis BMB171 genome 
(fig. 6B) leaves intact RNA-coding regions, and all the 
rRNA operons in the genome share the same mutant form 
(table 2). The rearrangement in A. pleuropneumoniae L20 
resulted in a large structural alteration of the gene (fig. 6A), 
and this mutant allele has spread to only two more loci out 
of five other rRNA operon loci (table 2). The mutant form 
in L reuteri JCM 1112 (fig. 9D) was found in two of six rRNA 
operon loci (table 2). 

Gene conversion between rRNA operons was proposed 
to be responsible for homogenizing the operons and retain- 
ing conserved rRNA structures (Hashimoto et al. 2003). 
Gene conversion may also help in incorporating foreign 
rRNA genes (Yap et al. 1999). Our report implies that gene 
of rRNA gene conversion can propagate mutants with 
a structural defect to other loci, which was up to a half 
of the loci within a genome in this study. 



Inactivation of some rRNA genes within a genome may 
not be crucial for cell growth. In E. coli, only five among 
seven rRNA operons were sufficient to compete against 
a wild-type strain under stable growth conditions (Condon 
et al. 1995). The smaller number of available rRNA genes, 
however, could be disadvantageous to bacterial adaptation 
in fluctuating growth conditions (Klappenbach et al. 2000; 
Stevenson and Schmidt 2004). It would be a good subject 
for experimental studies to investigate the fate and pheno- 
typic relevance of those rearranged rRNA genes. 

Conclusion 

In the present study, we revealed novel evolutionary histo- 
ries and variations of bacterial rRNA genes. The loss of the 
anti-SD motif in all of 16S rRNA genes within some bacte- 
rial genomes (table 1) indicated that the SD/anti-SD inter- 
action is not a universal translation initiation mechanism in 
prokaryotes. The loss is seen in many minute genomes that 
are obligate to eukaryotic hosts (hemotropic Mycoplasma 
and primary symbionts of insects), implying some relation- 
ship between the loss and extreme genomic reduction of 
the obligate bacteria. Flavobacteria rarely rely on the SD-led 
translation initiation (fig. 2), thus the anti-SD motif might 
start degenerating even in non-host-associated bacteria. 
We also described how the rRNA genes rearrange. One 
of the rearrangements (fig. 5) supports the earlier concept 
of necessity of co-orientation of rRNA operon with chro- 
mosomal replication. Other reconstructions (fig. 6) sug- 
gested partial rRNA gene duplication followed by gene 
conversion to other loci, some of which occurred across 
the special boundary. rRNA genes, likely laterally trans- 
ferred via a plasmid, can play a role in intragenomic rRNA 
sequence heterogeneity (fig. 7). We showed that gene con- 
version sometimes spreads defective forms of rRNA genes 
to other loci (table 2), thereby reducing available numbers 
of rRNA operons. These structural variants could provide 
novel targets for evolutionary and functional studies of 
rRNAs and translation. 

Supplementary Material 

Supplementary tables S1-S3 are available at Molecular 
Biology and Evolution online (http://www.mbe. 
oxfordjournals.org/). 
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